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Abstract 

A pattern of a sequence is a sequence of integer indices with each index describing the order 
of first occurrence of the respective symbol in the original sequence. In a recent paper, tight 
general bounds on the block entropy of patterns of sequences generated by independent and 
identically distributed (i.i.d.) sources were derived. In this paper, precise approximations are 
provided for the pattern block entropies for patterns of sequences generated by i.i.d. uniform 
and monotonic distributions, including distributions over the integers, and the geometric dis- 
tribution. Numerical bounds on the pattern block entropies of these distributions are provided 
even for very short blocks. Tight bounds are obtained even for distributions that have infinite 
i.i.d. entropy rates. The approximations are obtained using general bounds and their derivation 
techniques. Conditional index entropy is also studied for distributions over smaller alphabets. 
Index Terms: patterns, monotonic distributions, uniform distributions, entropy. 



^ ; 1 Introduction 

> : 

Recent work (see, e.g., [I], [5], [6], [TO], [13], [2]) has considered universal compression for patterns 

r> ' A 
of independent and identically distributed (i.i.d.) sequences. The pattern of a sequence x n = 

(xi,X2, • • • , x n ) is a sequence ifj n = ip = (x n ) of pointers that point to the actual alphabet letters, 
where the alphabet letters are assigned indices in order of first occurrence. For example, the pattern 
of all sequences x n = lossless, x n = sellsoll, x n = 12331433, and x n = 76887288, which is alphabet 
independent, is ip n = (x n ) = 12331433. Capital ^f(-) denotes the pattern operator. 

Patterns are interesting in universal compression with unknown alphabets, where the dictionary 
and the pattern of x n can be compressed separately (see, e.g., pQ). Pattern entropy is also important 
in learning applications. Consider all the new species an explorer observes. The explorer can 
identify these species with the first time each was seen, and assign indices to species in order of 
first occurrence. The entropy of patterns can thus model uncertainty of such processes. 



* Supported in part by NSF Grant CCF-0347969. Parts of the material in this paper were presented at the IEEE 
International Symposium on Information Theory, Seattle, WA, USA, July, 2006. 
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Initial work on patterns [5], [6], |10j . |13j . |14j . focused on showing diminishing universal com- 
pression redundancy rates. The first results on pattern entropy in [10], [12], [TT], however, showed 
that for sufficiently large alphabets, the pattern block entropy must decrease from the i.i.d. one 
even more significantly than the universal coding penalty for coding patterns. Since (x 11 ) is the 
result of data processing, its entropy must be no greater than Hq (X n ). For alphabet size k, 

nHg (X) - log [k\/ (max{0, k - n})\] < Hq (* n ) < nHg (X) , (1) 

where capital letters denote random variables, and 6 is the parameter vector governing the sourcqj. 
The bounds in ([1]) already show that for k = o{n) the pattern entropy rate equals the i.i.d. onecl 
for non-diminishing Hq(X). Subsequently to the results in [T2], it was independently shown in [3] 
and [7J that for discrete i.i.d. sources, the pattern entropy rate is equal to that of the underlying 
i.i.d. process. 

In contrast with the block entropy, for smaller alphabets, the conditional next index entropy 
Hg (^>£ | \E^ _1 ) is guaranteed to start increasing from Hq(X) after some time t > 1. The gain 
(decrease) in block entropy is thus due to first occurrences of new symbols, and gains only occur 
before first occurrences become sparse. This observation, pointed out also in [T], [7J, gives rise to a 
possibility of diminishing o(l) overall per block decreases of Hq (\P n ) from uHq{X). 

In [TT], general tight upper and matching lower bounds on the block entropy were derived. 
This paper continues the work in [TT], and uses the bounds derived in [TT] and their derivation 
methods to provide very accurate approximations of the pattern block entropies for uniform and 
several monotonic i.i.d. distributions. The complete range of uniform distributions, from over fixed 
small alphabets, to over infinite alphabets, is studied. Monotonic distributions from slowly to fast 
decaying ones are considered. It is shown that the pattern entropy can be approximated even for 
slowly decaying monotonic distributions with infinite i.i.d. entropy rates. Then, small alphabets 
and their conditional next index entropies are studied. 

The derivation methods are based on those in [TT] . The probability space is partitioned into a 
grid of points. Between each two points, there is a bin. Symbols whose probabilities lie in the same 
bin can be exchanged in x n to provide another sequence x' n with ^ (x n ) = ^ {x' n ) and almost equal 
probability. Counting such sequences, packing low probability symbols into single point masses, 
leads to bounds on Hq (^ n ). Proper choices of grids are key for tightening bounds. 

Section [2] gives some preliminaries. General bounds (somewhat modified) from [TT] are reviewed 
in Section [3l Next, Section [4] summarizes pattern entropies for different distributions. Finally, Sec- 
tions [5] and [6] contain the proofs for uniform distributions and monotonic distributions, respectively. 



2 Preliminaries 

Let x n be an n-tuple with components x,- t G X = {1,2, ...,k} (where the alphabet is defined 
without loss of generality). The asymptotic regime is that n — ► oo. However, the general bounds 
are stated also for finite n. The alphabet size k may be greater than n or infinite. The vector 

logarithms are taken to base 2, here and elsewhere. The natural logarithm is denoted by In. 

2 For two functions /(n) and g(n), /(n) — o(g(n)) if Vc, 3no, such that, Vn > no, \f(n)\ < c |s(n)|; f(n) — 0(g(n)) 
if 3c, no, such that, Vn > no, < |/(n)| < c|g(n)|; with inequalities it will be assumed that f(n) > 0, but with 
equalities, negative f(n) are possible; /(n) = Q(g(n)) if 3ci,C2,no, such that, Vn > no, cig(n) < f(n) < C2fl r (n). 
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that 



9i, 62, ■ ■ ■ , Ok) is the set of probabilities of all letters in E. Assume, without loss of generality, 
h ^ O2 < ■ ■ ■ < Ok- Boldface letters denote vectors, and capital letters will denote random 
variables. The probability of ip n induced by an i.i.d. source is 



E 



Pe (y r 



(2) 



The pattern sequence or block entropy of order n is 

A 



(3) 



Following [TT] (but more generally), consider two different grids: 77, and For simplicity of 
notation, we omit the dependence on n from definitions of grid points. Let eq, £%, and £2 be three 
numbers that satisfy £q > max(0,£i) and E2 > max(0,ei). Define 



A £ 2(7 - |) 



n 



l+£2 



7? 



l+£2 ' 



The grid 77 = (770,771, . . . , rj Br ,) is defined by 770 = 0, 771 
and 



1 



,! + £() ' 



m 



b' 



A 



(4) 



6+ [n( £2 - £l )/ 2 J -2, 



A 



3,4,... ,B V . 



(5) 



r? b+Ln( e 2-=i)/2j_ 2 

For some e > 0, if £0 = £1 = — £j and £2 = 2e, 77 reduces to the one defined in [11] . The more 
general definition here allows achieving tighter bounds for some specific distributions also for finite 
n. The relation E\ = —e will be assumed by definition, but the other two parameters will not be 
tied to e (except by e\, > — e). The grid £ = £1, . . . , £b,) is defined by £0 = 0, and for an 
arbitrarily small e = —£\ > 0, 



6 



A A 2(j - 0.5) 



3=1 



n 



l-e 



n 



l-e ' 



1,2, 



5, 



(6) 



For both grids, r) Bri +i = 6? f +1 = 1, and thus B v = ^n i+£2 - \n^- e ^/ 2 \ +2, and = y/n 
We also define the maximal indices A v , and whose grid points do not exceed 0.5 for 77, and %, 



-l-e 



respectively. Hence, A n 



l+£2 



/V2 



n 



l-e 



/y/2 



|^ n (e a -ei)/2j + 2) an d Aj: = 

Let fcj, 6 = 0, 1, . . . , -B,,; and Kb, b = 0, 1, . . . , B^; denote the numbers of symbols 9i £ (rjb, 77&_|_i], 
and 9i 6 (£b,6>+i], respectively (in bin b of 77 and £, respectively). Specifically, for given eq, e, and 
£2) &o and k\ denote the cardinalities of 9\ < l/n 1+eo , and 9i 6 (l/n 1+£0 , l/n 1_£ ] , respectively. 



Define also fcoi = ^o + ^i> thus k — koi is the cardinality of 0j > 1/n . Also, let K' b , b = 1, 2, . . . , .B^; 
be zero if is zero, and otherwise, the number of symbols for which 9% G (£fi-i,£b+2]> with the 
exception of k\, which will only count letters for which 9% £ (^1,^3]. (There is clearly an overlap 
between adjacent counters K b , which is needed for one of the lower bounds.) Now, let 



E 

9i&(r] b ,7]t, + i] 



(7) 
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be the total probability of bin b of r\. Specifically, tpo, <fi, and <p$i = <po + ipi are denned with 
respect to (w.r.t.) bins 0, 1, and 01, respectively. 

The mean occurrence count of letter % in X n is given by EgN x (i) = n9i, where n x (i) is the 
occurrence count of i in x n , N x (i) is its random variable, and Eg is expectation given 6. Let K b 
be a random variable counting the distinct symbols from bin b of rj that occur in X n . Let K be 
the total distinct letters occurring in X n . Then, let 

L b t Ee [K b ]= Yl Pe(ieX n )= Yl (8) 

i ■ 0i€(rib,Vb+l] Si&(Vb,Vb+i] 

and also define L = Eg [K] similarly. Substituting (1 — 9i) n = exp{nln(l — 9i)} and using Taylor 
series expansion, 

kb _ Y e~ ne *<L b <k b - e~ n ^ +e "). (9) 

9i£(Vb,Vb+i] 6i£(Vb,Vb+i], fj<3/5 

Specifically, using Binomial expansion for bin 6 = 0, 

n<fo ~ (?) £ 9? <L < - (j) £ 9} + (§) £ 9?. (10) 

i=l i=l i=l 

Similar bounds can be obtained for bin b = 1 if e\ > (e < 0). 



3 General Bounds 

General bounds based on [11] are summarized here. First, for given £q, e, and £2, that determine 
rj and £, define 

k 

H^iX) = - m log^ 01 - £ 9 t log9 u (11) 

i=fcoi+l 
1 fc 

h^ 1 \x) = -Y^iogw- y e ^°^- ( 12 ) 

6=0 i=fc 01 +l 

The i.i.d. entropies above pack low probabilities into one or two point masses. The following lower 
bounds were derived in |llj . 

Theorem 1 Let e > 0,£o > 0, and define £ with (0|). Define Z n = (Z%, Z2, . . . , Z n ) by Zj = if 
9xj < l/n 1 " 5 , and 1 otherwise. Let be the count of letters i such that 9{ G ( , i9~/n 1 ~ e , l/n 1_e ] 
and fcj i/ie cotmi of letters i with 9i G (l/n 1_e , -i? + /n 1_£ ] , where $ + are constants that satisfy 
tf+ > 1 > > 0. T/ien, 

F e (f n ) > nHf l) {X) - Si + S 2 + S 3 - S 4 (13) 
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where 



Si < log(fc-fcoi)! (14) 
Si < (l-eji^logKO + ^-^Ologsl+^log^-^oOI + ^Iniin^O.B)] (15) 



Si < (l-e n )^2\og( K ' b \) +£ n log{k- k i)\ + h 2 [min (s n , 0.5)] (16) 

6=1 

e n = mm{n-(k-k 01 )-e-°- ln \l} (17) 

where /12(a) = —a log a — (1 — a) log(l — a), 

fcoi 

S 2 = ^EelN^ij-PeiteX^log^- (18) 

«=i * 

S 2 > ^[n^-l + e-"(^ 2 )]log^ (19) 
i=l * 

s ^ - 3^ -I) T £ * - X + £ K - 1 + 108 if (20> 

v 7 i=l 1 i=k +l % 

5 3 >(loge) V (21) 
5 4 < min (nMwi), (1-4) k+^] + £ > + ^2 [min (4,0.5)]} (22) 



and 



where 



£ n-™ %>n-3 e 2(i?- - l)rt 1+e ' ^ j 

kg i>n -3 denotes the total symbol count with 9i > 1/n 3 , and 

/(^ + )=-in{^ln^ + l} (24) 

where the minimum is taken between the values of the expression for -d~ and for i9 + . 

Fix 5 > 0, let n — > 00, and e > (1 + 5) (In Inn)/ (Inn). T/ien, e n = o(l), e' n = o(l), and all terms 

but the leading ones in U5\) . i!6\) and in the second argument of the minimum in \2S\) are o(l). 

Second order terms are described in Theorem Q] more explicitly than in [llj and some terms are 
tightened (in second order) to allow use of the theorem for practical n in Section [6] (derivations of 
the explicit terms do follow [H]). This is specifically for cases where very slow rates are obtained 
for the gaps between the i.i.d. and pattern entropies, such as the geometric distribution. Term Si 
is the decrease in Hg (\I' n ) due to first occurrences of symbols with > l/n 1_£ , which results from 
indistinguishability among indices of letters in the same bin b > 1 of £. Term S2 is the cost of 
re-occurrences of letters with "small" probabilities. Term S3 is the penalty in first occurrences of 
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"small" probability symbols beyond a single point mass. The bound in (|2ip is under a worst case 
assumption. Term S4 is a correction from separation between "small" and "large" probabilities. 
Specifically, for iT = e" 5 - 5 ps 0.004 and i?+ = e L4 ss 4.06, > 0.5, and the last term of 

([23]) is upper bounded by 2.77/n 1+£ . 

The following upper bounds generalize the derivations in 
Theorem 2 Let eq, £\, £2, t], b, and b' be as in (0)-([5|), and let e = —£\. Then, 

H e {V n ) < nHfV (X)-U + R' 1 + R' (25) 
H e ^ n ) < nHf 1) (X)-U + R' m (26) 

where U > 0, and also 

Av ( T 1 

U > ^maxjo, L b log^, (l - min {l, k b e~ n,lb }) log(k b \) > 

b \ 1± ^po il j, h (27) 

6>2,fc b >l 



R'b < i ni Pb ~ L h ) log [min{A; fo ,n}] + mp b ■ h 2 ( — - J , 6 = 0, 1,01 

\ntp b J 



(28) 



where (28\) decreases with L b for 6 = and a/so /or 6 = 1, 01 ?/ either e < or k\, koi > (1 + 5) 
/or some <5 > 0, respectively. Also, 




£ »? log^^ (29) 



9 / ^ 2 1 ^0 \— ^ 

i:«i6(i76,iJM-i] / "^i^efe^+i]^ 

/or 6 = 0, and a/so /or 6 = 1, and 6 = 01 if £ <0, where r/01 = r?o = 0, and r?oi+i = "2- 



Fix 5 > 0, let n 00, and £,£2 > (1 + (5)(lnlnn)/(lnn). Then, (21) is 



U>(l-o(l))J2^g(k b \). (30) 



b=2 



The bounds of (|25 p -(|26 p consist of 1) an i.i.d. entropy which packs low probabilities into one or 
two point masses, 2) a correction term U, expressing the gain in first occurrences of symbols with 
9{ > l/n 1_e , 3) losses in packing low probabilities into single point masses (R' b terms). Theorem [2] 
compacts the representation of several bounds in [11] by allowing negative e. This also generalizes 
the upper bounds in [11] because two separate bins with probabilities asymptotically smaller than 
1/n can be created. This is useful in obtaining tighter bounds for fast decaying distributions, such 
as geometric distributions (see Section [6]) . The proof of the generalization is identical to the proof 
in [11]. Probability is sequentially assigned to the joint index-bin sequence (ijj n ,p n ). Repetitions 
are assigned the mean bin probability, and first occurrences of an index in a bin are assigned the 
remaining bin probability. In bins and 1 (or bin 01), repetitions are assigned smaller probabilities 
(which are optimized), and first occurrences thus greater remaining bin probability. The average 
description length of this code bounds the pattern block entropy (see [IT] for details). The bound in 
(|2"7|) uses the better decrease in the pattern entropy that can be obtained in each large probability 
bin. The second (second order) term is the quantization cost in all bins. The coefficient is tightened 
from [11] based on (9) in |11] to allow tighter bounds for finite n. 
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4 Bounds for Some Distributions 



4.1 Uniform Distributions 

The pattern entropy is bounded below for the complete range of uniform distributions. Applica- 
tions, as compression with words as the single alphabet unit, can have alphabets of k = 0(n) or 
larger. Pattern entropy for uniform distributions with k = 0(n) or larger is also interesting in 
applications of population estimation from limited observations (see, e.g., [8]). For uniform distri- 
butions, all symbol probabilities are in the same bin, (also, unlike other cases, Hg {typ | \E^ _1 ) = 
Hq (^>g | A^ -1 )). This guarantees a maximal decrease of the pattern entropy from the i.i.d. one 
for alphabets of k = o(n). For alphabets of k = 0(n), the analysis in [TT] can be simplified to 
derive tighter bounds due to the simplicity of the uniform distribution. First, however, the bounds 
derived from the general bounds in Section [3] are given in the following corollary: 

Corollary 1 Let 0< = 9 > l/n l ~ £ , for i = 1, 2, . . . , 1/9 = k. Then, 

nH e (X) - log(Jfel) < H g (f n ) < nH g (X) - (l - ke~ n ' k ^ log(fc!). (31) 

Let 6i = X/n, for i = 1, 2, . . . , n/X = k, and a fixed X > 0. Then, 

l-e~ A \ , n loge (l - e" A ) 2 
1 I nlog- + — — — ■ n - O(logn) < 

H e (V ri ) < (l- 1 "^ ^ nlog [minjn,^}] + n ■ h 2 P—^ - V (32) 

Let 9 { = l/n^ +£ , for i = 1,2, ... , n^ +e = k, and n>l. Then, 

(l-0 (-^^ + - J J log (en^) < He < log (2en^) . (33) 

Corollary [1] shows the decrease of the block entropy for a uniform i.i.d. distribution from the 
original process to its pattern. While the i.i.d. entropy is always nlog k, the pattern entropy behaves 
differently in three regions. For small k = o(n), the decrease in the block entropy is only in the 
second order essentially by log(/c/e) bits per probability parameter. In the other extreme k S> n, 
the block entropy decreases in its first order rate by a factor of 2n^ 1+£ from the i.i.d. one. If 
M > 2, while both the i.i.d. entropy rate and block entropy diverge, the pattern entropy for the 
whole block diminishes. This is expected since for such distributions the only pattern one expects 
to observe is ip n = 123... n. In the middle range (k = 0(n)), the decrease is in the first order 
coefficient. Specifically, for A = 1, the bounds in ([32]) reduce to 

n los c f 1 \ ^ n f 1 \ 

-logn + — — 1-- ■n-Oaogn)<H (^ n )<-logn + n-h 2 - , (34) 

e 2 \ e J e \e J 

which yield 

n n 

- log n + 0.29n - O (log n) < H e (^ n ) < - log n + 0.95n. (35) 

e e 

Thus, the first order gain (decrease) from the i.i.d. entropy is (l — ^) nlogn bits. The decrease 
is because not all letters occur in a sequence. The gain thus results from higher probabilities of 
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Figure 1: Bounds on pattern per symbol entropy Hg (W n ) jn vs. A for uniform distributions with 
9i = A/n,Vi for n = 1000 symbols. 



occurrence of new indices. The gaps between the lower and upper bounds in (|32|) and in 
(|35p affect only second order terms. However, tighter bounds for the middle range of uniform 
distributions are possible. Due to the simplicity of the uniform distribution, some looser bounding 
steps that are necessary to produce general bounds can be avoided. Theorem [3] provides tighter 
bounds for the A/n uniform distribution. 

Theorem 3 Let 9i = X/n, for i = 1,2, . . . ,n/X = k, and a fixed A > 0. Then, 



1 



n (e A - A - l) log e 

n log — H ^— j • n — O (log n) < 

A 



H e (^ n ) < 1 



logo + 



a 



nlog 
1 



max(l, A) 



Ae A 
log (a - 1) - 



mm - 



+ 



1 



log e 



a 



A 

1 1 

+ 



■ n + 



•log 



maxfl, A) , 
(a-l) + I 



max(l,A) ' A 
■ n + O (log n) , 



(36) 



where a > 1 is a parameter which can be optimized to minimize the upper bound. 



The bounds of (f36|) are tighter than those of ([32]) . For a specific A, the upper bound in (f36|) is 
optimized by taking a > 1 that gives a minimum. Specifically, for A = 1, 



n n 

- log n + 0.38n - O (log n) < # e (* n ) < - log n + 0.76n + O (log n) , 

e e 



(37) 



where the best choice of a in ([36]) leading to (|37|) is a ~ 1.93. In general, the smaller is A, the 
greater the optimal a. Figure [U shows the bounds of (|32H and (I36p on Hg (*$> n ) as function of A. It 
demonstrates the gaps between the i.i.d. block entropy and the pattern entropy, which significantly 
increase the greater the alphabet is. The bounds of (f36|) almost meet for larger A. 
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4.2 Monotonic Distributions 



While there exist processes for which the i.i.d. entropy cannot be bounded, the pattern block 
entropy, while it still increases with n (giving an infinite entropy rate), can be explicitly bounded. 



4.2.1 Slowly Decaying Distribution Over the Integers 

Consider the distribution over the integers 

r, 3 = 2,3,... 



3 j(logi) 1+r 

where 7 > and a is a normalizing factor. Approximating ^ 8j = 1 by integrals 

1 1 

< a < 



(38) 



0.5 + 



3(log3) 1 +T + 7(log3)T 



In 2 



0.5 + 



In 2 

7(log3)^ 



(39) 



The distribution in (|38p is particularly interesting for < 7 < 1, where Hq(X) = 00. This was 
used to demonstrate several points in [3], [7]. In particular, in [3], it was used to show that there 
exist i.i.d. pattern processes with entropy whose order is greater than G (n(logn) 1-<5 ) for every 5; 
< 6 < 1. Here, tight bounds approximate Hq (^ n ) for the distribution in ([38]) for every 7 > 0, 
even for relatively small n. While Hg(X) = 00 for < 7 < 1, for 7 > 1, it is computed by 



H e (X) = -loga + J2 



o 



i=2 



Lower bounding the two sums by integrals 



a(l + 7)log(logj) 



J=3 



J (log j 



1 1+7 



(40) 



H 9 {X) > -log a + ^ + f ) + 7) (1 + 7 ln(log3)) ^ H e (X), 

7-1 7 2 (log3)T 

v ' ~ — ev ' 2 3(log3) 1+ T 



(41) 
(42) 



Tighter bounds (on both a and Hg(X)) can be obtained by numerically summing more components 
of the sum, and using the integral bounds only on partial sums. The pattern entropy is bounded 
as follows: 



Theorem 4 Let n — > 00. Then, for 6 in 

(I +o i li).<|f^(log§) 1 - 7 



0(1+7) 
2 



(1 + o(l)) • <! a(ln 2) In log n + 2a 
tf*P0- (l + o(l)) aln2 



1+7 In log 3 I+7 In log n 
(log 3)t (logn)T 
1+ln log 3 1+ln log n 
log 3 log n 



(7-l)(logn) 



7—1 ; 



j , for 7 < 1, 

/or 7 = 1, 
for 7 > 1 . 



(43) 



Theorem [3] shows that the per-symbol average He{^ n ) /n is still finite even when Hq(X) = 00. 
Specifically, for 7 < 1 it is ((logn) 1-7 ), and for 7 = 1, it is G(loglogn). (For 7 < 1, a looser lower 
bound of the same order of magnitude was independently shown in [3].) The bounds in (|43p for 
7 < 1 include second order terms. For 7 < 1, while asymptotically in n these terms are negligible, 
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Figure 2: Bounds on Hg (\l/ n ) (left) and on nHg(X) — Hg (fy n ) (right) vs. 7 for different values of 
n for the distribution in (|38p. Subscript 00 indicates an asymptotic bound of Theorem [H 

they are not negligible for 7 — * 1 (the 1/2 factor in the logarithm of the first term), and for 7 — * 
(the last terms) . Additional second order terms for 7 < 1 that are negligible even in these cases are 
— log a, and for an upper bound, the last two terms of (|42p . For 7 > 1, Hg (^ n ) jn asymptotically 
equals Hg(X) but decreases from Hg(X) by (l/(log n) 7_1 ) . 

Figure [2] shows the asymptotic bounds of Theorem U] in (|43p as well as non-asymptotic bounds 
(which are derived in the proof of Theorem [J] in Section [6|) for different 7 and n. Curves are shown 
for bounds of Hg (* n ) jn (left) and of nHg(X) - Hg (^ n ) (right). As Theorem H and Figure [2] 
show, for small 7, (|38p decays very slowly. This results in infinite Hg(X) for 7 < 1, but also in 
a very significant decrease of Hg (*& n ) from nHg(X), where specifically Hg (^ n ) is finite even for 
7 < 1. While Hg(X) in this region is dominated by small probabilities, Hg (^ n ) is dominated by 
the larger ones. The decrease between the two is thus dominated by the fact that small probability 
symbols rarely repeat. As 7 increases, (f3"8"j) decays faster, the process is dominated more by the 
larger probabilities, and the decrease from nHg(X) to Hg (^ n ) becomes asymptotically negligilble, 
yet still significant for practical n. 

4.2.2 The Zipf Distribution - A Fast Decaying Distribution Over the Integers 

Now, consider the Zipf (or zeta) distribution over the integers (see, e.g. [18], [19]) given by 

Si = wrh^- i = 1 - 2 — <44> 

where 7 > 0, and £(1 + 7) is the Riemann zeta-function (see, e.g., [3]), given by 



00 ■< ' COO „.s- I 



1 1 f T 

((s) = Y—= / dx (45) 

t[n s r( S )7 e*-l 
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Figure 3: Bounds on uHq(X) — H$ (^ n ) vs. 7 for n = 10 3 (left) and vs. n for different values of 7 
(right) for the Zipf distribution in (|44|) . Subscript 00 indicates an asymptotic bound of Theorem [5j 

for s > 1, where T(s) is the Gamma function. Approximating ^ 9 j = 1 by integrals 

72 7 + 1 72 7+1 +7 + 2 I + 7 

7 — <C(l + 7) < ~ J <—^. (46) 

The Zipf distribution is very common in natural language and rare event modeling. The pattern 
entropy for the Zipf distribution is thus specifically interesting in compressing patterns of a previ- 
ously unobserved language. It can also be used for estimation of the number of letters or words in 
a language by applying methods such as in [8], but on a Zipf distribution instead of a uniform one. 
Unlike the distribution given in (I38p . for every 7 > 0, the distribution in (j44H has a fixed entropy 
rate. Bounding sums by integrals (separating leading terms) 



logC(l + 7)+ (1 + ' ] 



C(l+7) 



1 log(3^e) 

2 x +7 7 2 37 



± He(X) < Hg(X) < He(X) + c( ( j ± ^ ^ ■ (47) 



The pattern entropy is bounded as follows: 
Theorem 5 Let n — > 00. Then, for 6 in \44% 

H e (* n ) = nH e (X) - G (n^R log n\ . (48) 

More precisely, 

nH e (X) - ( 1 + - - 1 ) (1 + o(l)) log n < (49) 

V 7 3(i + 2 7 )y (1 + 7 ) . c(1 + 7 )— 

H e (* n ) < nH 9 (X)-(l-- + -- ] ) (1 - o(l)) logn. 

V e 7 2(1 + 2 7 ); ( 1 + 7 ). C ( 1 + 7 )i+^ 

As 7 increases, (|44"|) decays faster, and the decrease from nHg(X) to Hq (^ n ) is more negligible, 
because fewer letters with large enough probabilities dominate the process. For small 7, Hq{X) is 
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large and is dominated mainly by symbols with relatively small probabilities. Since such symbols 
rarely repeat, Hg (\& n ) is closer to 0, and the decrease from nH$ (X) is thus very significant. This 
behavior resembles that of uniform distributions with k 3> n. The coefficients 1 in the lower bound 
and 1 — 1/e in the upper bound reflect the effect of symbols with probabilities close to 1/n that may 
or may not occur. The remaining coefficients reflect decrease in entropy due to very low probability 
symbols, which are unlikely to occur in X n . Figure [3] shows the asymptotic bounds of Theorem [5] in 
(|49p as well as non-asymptotic bounds (which are derived in the proof of Theorem[5]in Section[6|) for 
different 7 and n. The gaps between the asymptotic and non-asymptotic behaviors are greater for 
smaller 7 and smaller for greater 7. For small n, second order terms are more significant. However, 
for larger n, the gaps between the bounds become negligible. (Specifically, only for 7 = 0.01, the 
asymptotic curves do not overlap the non-asymptotic ones on the right graph. For such low 7, 
curves for lower and upper bounds do overlap.) 



4.2.3 Geometric Distribution 

The geometric distribution, which decays faster than the preceding distributions, is given by 

~e 3=P {i- P y- 1 - j = 1,2,... (50) 

where < p < 1. It has a fixed entropy rate H$(X) = h,2(p)/p, where h,2(p) is the binary entropy 
function. Its pattern entropy is bounded as follows. 



Theorem 6 Fix p. Let n — * 00 and let 5 > (In 20) /(In Inn). Then, for in 150\) . 

(1 + 5) 2 (los Inn) 2 
nHe(X) - { \{ { , L - C L1 (p)(l + S) log Inn - C L2 (p) - O 



H e (* n ) < nH d (X) 



21og-r^ 

l-p)h 2 (p) 



1 



1 



log 



2e(2-p) 



2(2 -p)p 



1 + 



2(2 -p)p \{l — p) log j^- J (log log 



log log log n 
log log n 

^ + 



log n 
1 



< 



n) 



log log n 
1 



+ 



(log n) (log logn) 2 



where 
Cli(p) 

C L 2(P) 



log p 5 + 2p — 2.5p 2 



log(l — p) 3p(2 — p) 
5 + 5p - V , 1 (1 - P? ( 1 2 (p 2 ~ 2 P + °- 5 ) 

log 1 7T- 1 



log 



3p(2 — p) p 

21o §7r 



p 



1 — p 



3(2 -pf 



log 



1 — p 



+ 



(51) 

(52) 
(53) 



log-r- 

O 1 — P 



bg,max (p) 

+ £ log 

6=2 



2 log 



(b-l) % /I=p 



log-r^ 

& l— p 



+ log 



^(p)+4(pr 

4(p) 



and 



J g,max 



(P) 



2 1 1 



K(p) + K(p) < 



6.9 log e 



+ 1 



#(P) < 



1.4 log e 
1 ^ + 1 



(54) 
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Figure 4: Bounds on nHg(X) — Hg (^ n ) vs. p for n = 10 3 (left) and vs. re for different values of p 
(right) for the geometric distribution. Subscript oo indicates an asymptotic bound of Theorem [6j 
the subscript "simple" implies to an upper bound with U > in (|25p . 

Theorem [6] shows that Hg (^ n ) diverges from nHg(X) by at most [(loglogn) 2 ] , and if p is smaller 
(for n — > oo, p < 0.69), by at least (1/ (loglogn)). Due to the very slow rates, second order terms 
are necessary in f)51 1) for more accurate approximations. The proof of Theorem [61 presented in 
Section El is used to obtain numerical bounds even for relatively small re. Figure [4] and Table Q] 
show the asymptotic bounds of Theorem [6] and the tighter non-asymptotic bounds for different p 
and re. The small bounds are very sensitive to the parameters, which are numerically chosen. 
Hence, at larger p, where the bounds are small, "ringing" appears due to quantization of A 
larger choice of 5 above (5 > (ln(20/p 2 ))/(lnlnn)) will eliminate the last three expressions in (|53p 
of the asymptotic bound. However, it will not result in a tighter asymptotic curve. 

Due to the fast decay of (|50p . the decrease of Hg (fy n ) from nHg(X) is much smaller than 
in the preceding cases. Yet, for smaller p, ([5Pj) decays slower, and nHg(X) — Hg^ 71 ), although 
negligible w.r.t. nHg(X) for sufficiently large re, is still large. Furthermore, it is not negligible w.r.t. 
nHg(X) for smaller n. Table Q] demonstrates that. For example, for p = 0.01, even for n = 1000, 
nH e (X) - H e (^ n ) is over 10% of nH e (X). For re = 10, H e (W n ) < 2.28 while nH e (X) > 80. On 
the other hand, for p = 0.8, nH e (X) - H e (^ n ) is at most 18.66 for re = 10 10 . 

As shown in Figure [Hand Table [H the bounds on nHg(X) — Hg (*I' n ) are relatively insensitive 
to n for greater values of n. This implies that the decrease in the entropy effectively occurs during 
the first indices. This is also implied by the diminishing decrease from nHg(X) on the right hand 
side of (|5ip . While the true rate of nHg(X) — Hg (\& n ) may be between those of the lower and upper 
bounds, diminishing decrease of Hg (\& n ) from nHg(X) is possible. Fast decaying distributions may 
effectively behave like distributions over small alphabets, and the gain in Hg (\P n ) is only due to 
occurrences of new indices. Once these become sparse, we may have Hg (^ I > Hg(X), thus 

possibly decreasing the gap between Hg (\& n ) and nHg(X) (as discussed in Subsection 14. 3h . 
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Table 1: Bounds on Hg (ty n ) for different (finite) n. 



y 


n 
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5728 


5630 


5124 


98 


604 




10 4 


57280 


57182 


56348 


98 


932 


0.8 


10 1 


9.02 


8.96 


5.26 


0.06 


3.76 




10 2 


90.24 


90.16 


82.4 


0.08 


7.84 




10 3 


902.41 


902.34 


893.15 


0.07 


9.26 




10 10 


9.02 • 10 9 


9.02 • 10 9 


9.02 • 10 9 


0.07 


18.66 



4.2.4 Linear Monotonic Distributions 

The monotonic distributions considered above were all over infinite alphabets. Consider a mono- 
tonic distribution over a finite alphabet, whose probabilities increase linearly. An example of such 
a distribution is given by 

2(i-0.5)A 2 n 

6i = - 2-^—, i = l,2,...,k = -, 55 

n z A 

where A, < A < n; is a parameter. This parametrization is very similar to that of the uniform 

distribution in Theorem [3j but here the distribution is monotonically increasing. For A = 1, k = n, 

and 6i < 2/n for all i. If A 3> 1, k = o(n), and if A <C 1, k 3> n. The i.i.d. entropy rate of (i55|) is 

Hg(X) = log - + log ^ + O (- log ^ (56) 

Ti Z \Tl A J 

where the last term is negligible unless A = 0(n) (i.e., k = 0(1)). The pattern entropy of the 
distribution in (1551) is as follows: 



Theorem 7 Let n — > oo, let 5 > be fixed arbitrarily small. Then, for 6 in |5< 



nH e (X)-o{l), if\>nTi +s 

nH e {X)-(l + o(l))l\og^, ^/^<A<n^ 5 (57) 
[ (l + (l))C A Anlogf, ifX<i 



where (1 - 2A/3) • 2/3 < C x < 2/3. 

Figure [5] shows the bounds for two regions of A, and compares them to the bounds in Corollary Q] 
of a uniform distribution. The curves include second order terms shown in the proof of Theorem [7] 



14 





H (X) Linear 
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H 8 (*")/n UB-Linear 
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-0- H„ (X) Uniform 
-e- H 9 (* n )/n UB-Uniform 
- e - H 9 {m")ln LB-Uniform 


S:f! = : =*:=:*::= ft :..; 




— e o 1) u — ii 








Figure 5: Bounds for the linear distribution in (|55|) and the uniform distribution with 9i 
H e (m n ) /n vs. A with n = 10 3 (left), and on [nH e (X) - H e jk for n = 10 10 (right). 



A/n on 



in Section [6l Also, more complex bounds (not shown in Section [6] for brevity) obtained using 
Theorems Q] and [2] for the boundary between the last two regions are used. When k = o (n 1//3 ) 
(first region), there are no letters with very small probabilities. All letters are distributed away 
from each other, such that at most a single letter populates a bin. Hence, He (^J7 n ) hardly decreases 
from nHg(X). When k = o(n) (and is in the second region), first occurrences of letters with large 
probabilities dominate the decrease from nHg(X) to Hq (^f n ). The behavior is very close to that 
of Corollary [H However, each parameter gains log (n/A 3 / 2 ) bits instead of logfc = log(n/A) (e.g., 
if A = k = y/n, instead of 0.5 log n, the gain here is 0.25 log n). In the last region, Hq (fy n ) = 
O ((n 2 /fc) log re). This order of magnitude, again, equals that of a uniform distribution. 



4.3 Small Alphabets 

While H e (^ n ) < uHq(X), it is not guaranteed that Hq (^ | < H g (X), or even that 

Hq ( x I / ™ _|_ 1 I \l/ no ) < (re — no)Hg(X) for some reo < n. Following the chain rule 

He (K +i I * n °) = H e (* n ) " H e (* n °) 

= [H e (X n ) - H e (X n | # n )] - [H e (X n °) - Hq (X n ° \ # n °)] 

= (re - n o )H {X) + Hq {X n ° \ * n °) - Hq {X n \ # n ) . (58) 

For a larger re > reo, it is not guaranteed that Hq (X n \ > Hq (X n ° \ \P n °). In fact, for a 
smaller alphabet and small tiq, the opposite may be true, because the longer pattern may have 
less uncertainty of which symbols correspond to which indices. This argument is in concert with 
the proof of Theorem 7 in [7] and Proposition 4 in [4], which show that for a smaller alphabet, 
as n — > oo, Hq {^ n +i \ > (1 — o(l))He(X). This is true for n — > oo, as long as Q, t > l/n 1_e , 
Vi < k; for an arbitrarily small e. 

Opposite behaviors, where H e (* n ) < nH e (X) but H e (^i \ > H e (X) for £ > n for 

some no > 1, occur for smaller alphabets because the decrease in the block entropy is dominated 
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Figure 6: Hg (^ | ^ 1 ) — Hg{X) for a binary alphabet as function of the bit probability 9 (left), 
and for a ternary alphabet as function of 9\ and 62 (right). 



by first occurrences. Once the dominant symbols in the distribution occur, the remainder of X n 
consists mainly of reoccurrences, where no decrease in entropy is exhibited. Such a behavior can 
also extend to fast decaying distributions, that while still over infinite alphabets, may only have a 
small subset of the alphabet symbols that will effectively occur in a sequence, such as the geometric 
distribution. Figure [6] shows Hg (^ | — Hg(X) for a binary and a ternary alphabet. In 

the binary case, the decrease of Hg (^ n ) from nHg(X) is the sole result of the first index, where 
£r fl (*i) = 0. All remaining indices have Hg($ e | V*' 1 ) > Hg(X). Thus nHg(X) - Hg(^ n ) 
diminishes to as n > 1. As shown in Figured! a ternary alphabet exhibits a similar behavior, 
except that Hg (^ | > Hg(X) for the first time at a larger t. The value of that t depends 

on the parameters of 0. Pattern entropies shown in Figure [6] were computed precisely using 

E(M«.. B ..M*))-nc fl -K«( e n# M( 4 <») 

n x i=l {<T(n x )j=l ) 

where = (n x (l),n x (2), . . . ,n x (k)) is the occurrence vector of the alphabet symbols in x n , the 
outer sum is taken over all such vectors, and the inner sum is taken over all k\j [k— |n x |)! nonzero 
element permutations cr(n x ) of the occurrence vector, where \n x \ is the cardinality of nonzero 
components in n x . Conditional entropies were then computed with the first equality in ([58]) . 



5 Uniform Distributions - Proofs 

Proof of Corollary [1} Corollary Q] results directly from Theorems [T] and [21 The lower bound of 
(|3ip is that of ([1]), resulting also from (jT3j) and (|14p . The upper bound follows directly from (|25p 
or (|26p with (|27p . where R' b = 0, and the second term of (|27p does not exist because there is no 6i 
in a bin which differs from the average bin probability. The lower bound of (|32p follows from (|13p 
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with Hl 01) (X) = 5i = S 4 = 0. Then, from (USD, 



77 

S 2 > - I A - 1 + e 



(a) 

> 



1 



A 



i n 
log A 



n _\ n 
re log — — Ae log — 
A A 



(60) 



where (a) follows from e A2//n > 1 — A 2 /n. Then, from (|2ip . 



<S 3 > (loge) £ (Loi 



(a) 
> 



1 



8=1 

~ A ) 2 loge 



A 



1 



n 



~ A ) loge 



(61) 

2A 2 v ; 

where (a) follows from the lower bound in ([9]). Summing (|60 p -(|6i p yields the lower bound of f)32[) . 
The upper bound follows from ([25]) . where only i?^, upper bounded by (|28|) using the lower bound 
on Li in Q, is not zero. The lower bound in (|33p follows from (|60p - (|6ip with A = l/n^ 1+£ . 
Expressing exponents by their Taylor series, 

u+e _ A1 °ge 



S 2 + S 3 > (l-^^logn^ e + (l-A)^loge-Alog 
;-o(A + i))^log(en^) 



rv 



1-0 



1 t 1 

n H-l+E n 



n 



2-p-e 



log (en^ +£ ) . 



(62) 



The upper bound follows from (|25p . where only R' , which is bounded by (|29p . is not zero. □ 



Proof of Theorem [SJ For the lower bound 



H e (* n ) 



(b) n 
= n log — 
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> n log — 
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log 
log 
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n (e A -A-l)loge 
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A Ae A 



(63) 



Equality (a) computes the average cost of repetitions (the first term) and that of first occurrences 
(the second term). Then, rearrangement of the second term leads to (b) by using 1 — mX/n = 
(A/n) • (re/A — m) and EK\ = L\. Inequality (c) is by Jensen's inequality. Next, (d) is obtained 
from Q, and finally, Stirling's approximation 



v& (^f < ml < v& (^f • eVdM 



(64) 
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and Taylor expansion of e x ' = 1 — 0(l/n) are used to obtain (e), proving the lower bound. 



To prove the upper bound, the pattern entropy is upper bounded by the average description 
length of a code that assigns probability p\ = X'/(an), where A' = max(A,l) and a > 1 is a 
parameter, to a repeated index, and the remaining yet unassigned probability to a new index. 
Using this code, 

(a) "/ V / \'\ 

He^l < {n-L 1 )log^-Y J Pe{Ki=j)Y,M 1 - r ^) 

j=l m=0 ^ ' 
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(65) 



Inequality (a) is since the entropy is upper bounded by the average description length of the code, 
which consists of the cost of repetitions (first term) and first occurrences (second term). The bound 
in (6) is under the worst case assumption that all re/A' symbols occurred. The first occurrence of 
the last new index is assigned probability 1 — 1/a + A'/ (an), and those of the preceding indices are 
assigned this probability plus increments of X' /{an), depending on the occurrence time. This step 
produces a tighter bound than in (|25p . Next, (c) follows Jensen's inequality and the concavity of 
— log(x!). Finally, (d) follows Stirling's approximation and the bound in ([9]) on L\. □ 



6 Monotonic Distributions - Proofs 



6.1 Slowly Decaying Distribution Over the Integers 



Proof of Theorem |4} Let j'q and ji be the indices of the greatest Oj < rjx, i]2, respectively. Then, 



substituting Xb = cm 1+e& (ln 2) 1+7 , it can be verified that 
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(66) 



The value of /3b can be found numerically. It is constant for large enough n, and as n — > 00, it 
approaches 1. Thus % = O {n l+£b / {Yog n) 1+7 ) . Using an integral to approximate a sum 
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Similarly, 
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where (b) follows from approximating sums by integrals, substituting the value of 9301 from (|68p 
and including terms equal to the last two of the upper bound in (|42p in an O(l) term. Then, (c) 
follows from substituting j\ from (|66p with e = 0((log log n)/(log n)), and absorbing all second 
order terms. Note that second order terms resulting from ji in this step, which are absorbed in 
other terms, are negligible w.r.t. the terms expressed above even if 7 — * or 7 — > 1. A similar 
derivation follows for 7 = 1, except that the second term in step (b) is replaced by the proper value 
of the integral as shown in (|43|) . In a similar manner, for 7 > 1, 
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a(l + 7) log log j 
j(logj) 1+ T 

a(l + 7) (1 + 7lnlog ji) 
7 2 (logji) 7 



+ 



1 



ii(iogji) 7 

(71) 



(7-l)(logn)T-i 

where (6) follows from approximating sums by integrals, and (c) from substituting ji from (|66p and 
absorbing second order terms, realizing that the dominant decrease emerges from the third term. 

Next, we lower bound the first sum in (I20p and £3 by 0. Then, choosing eq = 0, 



(a) 



jo-l 



S 2 > M ~ l) lo § ^ - (jo - ji)) log (voi™ 1 ^) ( = } 



3=31 



n log log n 
(logn)T 



(72) 
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where (6) follows from 6j < l/n 1 ~ £ in bin 1, and (c) from (|66p and (|69p with the choice of £ and £o 
above. (Note that a tighter nontrivial bound for the second sum of (I20p can also be obtained, but 
has a negligible effect.) Next, using the trivial bound of (I14p . 

S 1 <log(j 1 \) = o( 7 ^-]. (73) 



v (log nfi , 

Similarly, with a proper choice of constant for e = 0((loglogn)/(logn)), S4 = 0(ji). Adding 
the bounds above for all terms of (fl"3]) (normalizing Si, S2, and S4 by n) results in lower bounds 
satisfying (|43p for all regions of 7, where, regardless of 7, the expression is dominated by Hg 01 \x). 

The bounds obtained above are asymptotic. To derive the numerical bounds in Figure [2] for 
finite n, steps (a) of (|70p . (|7ip . and (|72p are used to compute sums (where dominant components of 
the sums are added, and remaining, sometimes infinite, partial sums are approximated by integrals). 
The value of e is numerically tested for different values, and £0 = is used. The precise expression 
in (|22p is computed for each e. Then, e that gives the maximal bound for each 7 and n is used. 
Roughly, £ ~ 1.7(lnlnn)/(lnn) produced the tightest lower bounds. 

Asymptotically, (|26p is sufficient to obtain an upper bound on Hq (*S> n ). A choice of e = yields 
identical bounds to those in (fT0j) - (f7Tj) for Hn (X). Then, the trivial bound U > is used. Finally, 

^ j ^J 2 ^3? +21 ~ (ji-l)(logJi) 2 +^ Vn(logn) 1+ ^' 1 ) 

where (a) follows from an integral upper bound, and (b) from (|66p . yields, using (|29p . 

V(logn)V 

Combining the terms of ([26]) from ([70]) - ([7T]) and ([75|) yields upper bounds satisfying (j4"3j) dominated 
by Hq 01 \x). The numerical upper bounds in Figure [2] can be obtained using these terms, where 
precise expressions from steps (a) of ([70]) . (fTTj) and from ([29]) are used to obtain Hq 01 \x) and i?^, 
respectively. Slightly tighter bounds can be obtained using ([25]) . where i?' and are bounded 
separately, and £0 is numerically optimized to minimize the bound. (These are the bounds shown 
in Figured) □ 



6.2 The Zipf Distribution 



A 



Proof of Theorem [5} For convenience, let a = 1/C(1 + 7)- Let jo and ji be the indices of the 
greatest 6j <r]i, 772, respectively. Then, 



Jb 



1 i+^h 

Q/l + 7 . n ! + T 



0,1. 



(76) 



Similarly, for b > 2, define jb = max|l, [a ■ n 1+£2 /(b' + l) 2 ) 1+7 X as the index of the greatest 

Qj < %+i (where b' is as defined in ([I])-©). Note that jb = 1 for b' > Van 1+£2 — 1, = j^-i — jb, 
and some bins may be empty. From (|76p and bounding a sum by an integral, 



^0 



J=J0 



7Jd 



1 + ^ 

Jo 



(77) 
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Similarly, 



a a I 7 

—7 < V01 < —7 1 + — 



a 



Vl = V01 - Vo > —7 
7Ji 



Jo 



-I I ! - 



Jo 



(78) 



From f[76j) -<[78 |) . it follows that 



nipo 
nipoi 



7 1 

yT+7 



• n 



1 + 7E 



• n 



!+t + O (n £ ) 



While /co, &01 = 00, it follows from (|76p that 

fci = Jo - Jl = Jo 



1 — O in !+t 



(79) 



(80) 



The lower bound of (|13p can be derived for the distribution in (|44p by separately bounding its 
terms. First, 53 > 0. Then, nHg 01 \x) + S2 is lower bounded, and Si and S4 upper bounded. 



nHf 1) {X) + S 2 > n ^(X)-n ^%log^--^lo 



(a) 



V01 



Jo-1 



.7 =.70 



3=3i 
s 



^01 



Vi 



+ (l- — --)-f fclog* 

3=30 



V 2 
V01 



(81) 



^3 



where (a) follows from lower bounding (|20p . the definition of Hg >1 \x) in (lllj) . and from combining 
terms. Note that the summand of ([8]) can be inserted to the summand of V2 above to provide a 
tighter numeric expression. Now, 



V1 + V2 = (nipp + j - ji) log — + (1 + 7) ^ log j + (1 + 7) an • ^ 



V01 
a 



io-i 



log j 



J=J1 



3=30 



1+7 



(82) 



(6) 



< ("Vo + Jo - Ji) log — + (1 + 7) log ^7 + (1 + 7) OLTl 



Jo! 



log jo , log jo , loge 



a 



Jl! 



,•1+7 

Jo 



+ 



■7 

7Jo 



+ 



2 -7 

7^Jo 



•1+7 

< (nvo + Jo - Ji) log + (1 + 7) 

a 



nyp log e 

7 



(jo - ji)loge 



+• O ( ji log 2° 
Ji 



where (a) follows from the definitions of (fo and and (6) follows from bounding the sum in the 
last term by an integral. The lower bound on ip$ in ([77]) leads to (c). A choice of Eq = leads to 
the minimal tradeoff between rupo and jo in the dominant term of (|82p . The smallest possible e will 
minimize the bound in (|82|) . By Theorem [H this value is constrained to e = [(log log n) / (logn)] 
to guarantee sufficient rate of S4. Using ([75]) and ([79]) for jj, and respectively, this yields 



(X !+7 1 

Ul + V2 < (1 + o(l)) logn. 

7 



Bounding sum by an integral 

V 3 > 



a 2 n 2 j 



2(1 + 2 7 )j 



2+27 




• log 



V01J0 
a 



1+7 



+ 



a 2 n 2 (l +7) 
2(1 + 2 7 ) 2 



Jo 



2+27 



• log e. 



(83) 



(84) 
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Using the substitutions above for e an d e, and plugging ([83]) and ([Ml) into ([HTj) . 



i 

r (oi)/,. 1 1 \ a 1+7 



ntf^PO + ^ > nff„(I) - I 1 + - - 3(1 + 27) j YT^i (1 + n ~ l0gn ' (85) 

Optimization that also includes the bound on V3 in (|84|) yields a slightly greater optimal £0 > 
(roughly between 0.1 and 0.2) that produces the maximal overall lower bound on nHg (X) + S^- 
However, this bound, while more complex, only negligibly gains on the one in (|85p with £q = 0. 

Since ji = k — koi, using the simple bound in (Q3]) on Si, plugging e = G [(log log n) / (logn)] 
5*1 < log (ji!) = O (ji log ji) = O fn 1 ^ logn) = o (n~ logn) . (86) 



In a similar manner, S4 = o log nj. Combining (f85j) . (f86j) . 53 > 0, and the bound on S4 into 

(fT3j) yields the lower bound in ([Ml). 

The lower bound in (|49p is asymptotic. To obtain precise curves as in Figure [3] for finite n, 
jo and ji are computed with (f76j) . Then, either ([77|) - ([7H|) can be used to bound (po and 9901 1 
or they can be computed precisely substituting jo and j\. Step (6) of ([82]) and (|84]) are used 
to provide a bound on nH^ l \x) + S'2, and more precise bounds are obtained on S\ and S4. 
(Alternatively V2 can be computed precisely as discussed following (|8~T1) .) To obtain bounds on Si, 
let Lb = max |l, (a • n 1_e / (6 + l) 2 ) |, 6 = 0, 1, . . .; be the index of the greatest 9j, such that 
6j < ib+i- Then, n\ = lq — ii, and 



1 l-E 

a 1 +in 1 +~r 6 



H = i-b-2 ~ t-b+i < — • 77 — 1U1 , \ +1; 6 = 2, 3, ... . (87) 

(5_1)— (6-l)(l+7) 

This implies that only for 

, . l+x 

I 6 \ ^_ i=£ / ^_ 

< • a 3 +~< ■ n 3 +~t +l = o( n 3 +T 

\l + 7/ V 

there may be more than a single letter in the bins surrounding bin b resulting in nonzero summands 
in (|16p . Similar derivations can be performed to generate the elements of the sum in (|15p . and more 
precise bounds on S4 using ([22]) . Bounds are obtained for different values of e, and the value that 
attains a maximum is used for every 7 and n. Note that S4 trades off with V\ + V2 by requiring 
a greater e to guarantee that e' n in (|23p diminishes. The choice of $ + and also influences the 
tradeoff (a smaller f? + — $~ decreases the dominant term of S4 in (|22h ). Roughly, the optimal value 
of e leading to the curves in Figure [3] equals 1.75 to 2 times (In In n)/(ln n) for large enough n. The 
curves in Figure were produces with i9~ = e~ L97 , and ?9 + = e 0,98 , that lead to /($ _ ,?9 + ) > 0.2. 

To derive a tight upper bound, (l25|) is used, where rj is built with e = and £q, £2 > 0. This is 
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necessary for a tight bound on R' t and a negligible one on R' Q . First, 

j'o-l 2 00 2 

nffi (X) = nH (X) + V n& log -±- + V ri0,- log 

3=31 3=30 

00 , 

= nH e (X) -nipilog n^olog (1 + J)an > — — 

a a ^-^ v ' 

3=31 

< nH 6 (X) - jvp-i log n^o log (1 + j)an < — ^- + 

a a I jj{ Yj{ 

(c) / 1+7 \ / 1 + 7 

< nHg{X) — (1 — o(lj) nipi log I nipie t ) — n</Jo log I nt^e t 



™ -J- (1+7) 

< nHo(X) --ni+T-log ri ho n x +T logn (89) 

7(1 + 7) ^/ 1 +7 \ / 

where (a) follows from (I44h and the definition of a, (6) follows from bounding a sum by an integral, 
(c) follows from (|78|) and (|76p . and (d) follows from (|79[) absorbing second order terms. 

To bound R[ and i? , similarly to (|77l) -fl78l. 

E~ 2 ^— ^ a 1 or Ji 1 a'+m^ 

°i = 2^ J2T27 " ^2 + fl 2 N -2+27 " ^2 + (i + 2 7 )n 2 ' ( ^ 

oo -J— 

E~ 2 1 a^n'+i 

ff i - n 2+2 £0 + (1 + 27)n 2 + 2e " ' ^ -* 

J=io 

From ([25} and ([90])- |EE]), it follows (using fc a < j and (f78"]) ) that 



i i i i i 

* £ " + 1^+^ + ■ '° s 2e<1 + ?*"" + ° (92> 

* = o |'=^52 I . (93) 



While i?Q requires a greater £o to minimize its contribution to the bound, R'i requires a smaller £o 
(which implies that k% is smaller). Trading off, a choice of eo = 6 [(log log n) / (log n)] is optimal. 

Finally, for bin b > 2 of rj, 

h = jb-i ~ 3b = (1 + o(l)) a^n 1 ^ L_^. (94) 

W+ 7 (b , b + 1 ) 1+J J 

where b' b is the index in rj' as defined preceding ([5]) . Specifically, since 772 = 

1/n, b[ =n £2 / 2 (l + o(l)). 
Following ([9J and 9j > r/2, we have L b > (1 — l/e)A;&. Using ([27|) and £2 = © ( (log log n)/ log n), 



U > (1 + o(l)) • J] L 6 log ^ > (1 + o(l)) • - ~) • £ k b log 

fe>2 ' ^ ' b>2 

Since 6' fe > b\ — ► 00 as n — > 00, 



P- 1 ^**. (95) 



1 1+E2 

a l +7n l +7 ^ 



fc 6 >(l + o(l))-— — 3+7. (96) 

1 + 7 b'— 
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Plugging both ([M]) and ([96]) in ([95]) . 

(1 — l/e)a T +^n 1+ ~< (3 + 7)a~n j +t ^-^fog^'h 



1 1+E2 1 1+^2 

(l-l/e)a~ 

r - : — - 
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(a) / 1\ a^ni+i / 



C7 > (l + o(l))- fl--V \J2 k b l °2 

V 6 ' I b>2 1 " ^ ' " b>2 b' U ~ 



e(l + 7) (1 + 7) 2 



i i 



> (1 + o(l)) -1 • logn — O (e2n 1+ ^ logn 

\ ej 1 + 7 \ 

i i 

(b) ( l\ a 1+ Tn 1+ T 

= (l + o(l))- (l-^j- 1 + 7 log^ (97) 

where (a) follows from (|94p and the telescopic property of in b' b , since = n 62 and £2 = 
((log logn)/ logn), and by approximating the second sum by an integral. Then, (6) follows again 
from the value of £2- 

Now, substituting ([55]). pZ ]) -([9"3" ]) . and (97J) in {25]) yields the upper bound in ([39"]) . Again, for 
the numerical bounds in Figure [3l if}, and j'b are computed and then used with step (6) of ([89]) and 
with ([90]) - ([9T]) and ([29]) . For a tighter bound on i?^, ([28]) can also be used directly where L\ is 
computed with ([8]). Then, C7 is bounded with ([27]) using ([M]) to compute fc;, and ([9]) to compute 
Lft. Finally, for each 7 and n, values of £0 and £2 that minimize the bound are chosen. The value of 
£0 is large for smaller n, and decreases with n, roughly following the curve of (In Inn) /(Inn). This 
concludes the proof of Theorem [5j □ 



6.3 Geometric Distribution 



Proof of Theorem [6} Let jo and ji be the indices of the greatest Oj <rj\, 772, respectively. Then, 



log! 1 ] 
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Jb 



l+e b 



logT^ 



< : ( T , 6 = 0,1. 



(98) 



<p = J2p(i-py- 1 = (i-p) i °- 1 . 



For b > 2, define % = maxjl, ["log {pn 1+£2 / [(6' + 1) 2 (1 — p)l } /log[— (1 — p)]] } as the index of 
the greatest Oj < n 6+1 (where b' is as defined in ([I])-©). Note that jb = 1 for b' > ^pn 1+£2 — 1, 
k[, = — jb, and some bins may be empty. From 



(99) 
(100) 

(101) 



3=00 



Similarly, 

Vol = (1 VI = V01 -<po = V01 {l - (1 -p) jo ' jl } 

From p8]) - ([T00]) . it follows that 



pn 1+£ 
1— p 1— p 
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Vo 



pn x pn 1 + £r l — — pn 1+e l pn 1 e ' 

While fco, koi = 00, if pn 1+£l > 1 — p, it follows from ([98|) that 

log (1 - p)] < = _ < ^g ^ 
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log(l -p)' 
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Similarly to (pl) - (fT00|) . 



O OO O 



3 2-p' 

3=30 3=31 



2-p' ^ J 



£2 = P mi ~ W («) Woi_ (l _ (1 _ p )3*l 



2-p 



2-p 



(103) 



where (a) follows from fl99]), (ffiEfl) and (fT02l) . 

Now, the lower bound of (|13j) can be derived by separately bounding its terms. First S3 > 0. 
Then, nHg (X) + S 2 is lower bounded, and 5i and S4 upper bounded. Lower bounding (|20|) . 



nHf l) {X) + S 2 > nHp(X)+[l 



(oi). 



3n £ o n / 2 ^ J 6 fl ^Z^ 



<£oi 



io-i / 

- 1 1 log 
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3n e ° n 



V 3 



(104) 



where (a) follows from the definition of Hg 01 \x) in (llip and from combining of terms. Each 
component Vi is now bounded. By definition of 6j, 



Vi = ntp Q \og — +np(l -p) [log(l - p)] S~] (j - 1) (1 - p) 3 2 

<A)i ^ 

3=30 



= nipolog— +n[log(l -p)] < 
<A)i 



(6) i^o nip h 2 (p) 

= nipo log 
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p 2 n £ ° 



'e + e )logn 



(105) 



where (a) is obtained by representing each term of the sum as a derivative of (1 — p)^ -1 w.r.t. 
(1 — p), exchanging order of summation and differentiation, and computing a geometric series 
sum, (6) follows from (|99p . and (c) follows from the upper bounds of (jlOip because the expression 
decreases with (po% , and for tpo < (foi/e also with c/?o- From (jlOOp 



(1-pV'i ^ l 

V2 = (jo - h ) log hVj log — 

p r^r 1 — 
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( = } it! (o.Bfea log 
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" -21og(l-p) 



1 1 , 1 
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(logn) 2 + 0.5 + 



logp 



log(l - p) 



(e + e) log n + log - (106) 
V 



where (a) follows from computing the sum in the second term and using the definition of k\ in 
(|102p . and (b) follows from the upper bound on k\ in (|102p . Applying similar techniques to those 
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in ([TMD, 



« 2 / P^O , VOl . (1 - V? 2 



2 l2-p log Wo + (2-p) 



Vo lo g Y" 
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(e + e) log n 



(e + e) log n + log - + 



1 , /(1-P) 2 



p \p(2 — p) 



1 log 



1 — p 



n 



2eo 



(107) 



where (a) follows from the lower bounds in (|10ip - (|102p . 

To bound Si, let if, be the index of the greatest 9j, such that 9j < £b+i- Similarly to 



if, = max < 1, 



log 



pn 



(6+lp (1-p) 



6 = 0,1,.... 



Hence, «j = to — ^2 < — 2(log3)/log(l — p) + 1, and 

21og£±f / . x 

H = l b-2 ~ tb+i < - — ^j— + 1; 6=2,3,..., mm [B^, V pn 1 " e /(l - p) - 2 J 



(108) 



(109) 



For b > 2, > k^+i- Hence, the maximum bound is obtained for 6 = 2, k' 2 < —4/ log(l — p) + 1. 
Only as long as n' b > 2, elements of the sum in (1161) are nonzero. This is only possible as long as 



6 < 



2 + 



J g,max ■ 



(110) 



Since — koi = j\ — 1, from (fT7|) . e„ < min {l, njie °- ln }. Combining these bounds, using (fl~6j) . 



Si < (l-e n )< 



log 



21og^= 

lo gT J- 

& 1— p 



fc=2 



21o g7r ^±^ 



+e n log(ji!) + h 2 [min (0.5, e n )] . 



Ill) 



To guarantee that the last two terms diminish at O [(logre) 2 (loglogra)/n] (since ji = O(logn)), 
e > (1 + S) (log Inn)/ (log n), where 6 > (ln20)/(lnlnn) must be used, and then, Si = 0(1). 

An upper bound on S4 is derived similarly to that on Si. Choosing $~ = e -5 ' 5 and $ + = e 1,4 , 



K < 



log I 1 
logT^ 
log?? + 
log'T^ 



+ 1 



+ 1 



6.9 log e 
logT^ 



+ 1 



"6i>n- 



log^ 



1.4 log e 

f- + l 



logT^" 

& 1 — r 



logT^ 
O (log n) . 



(112) 
(113) 

(114) 



Plugging these values in (|22p with the choice of e above yields S4 = 0(1), where all terms of (|22p 
but the first diminish with n. (The bound can be tightened by narrowing \&~ , Such narrowing 
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is limited to decreasing f(-d~,'d + ) in (|23p . such that it still produces diminishing terms in (|22p .) 
Note that if £ is redefined by £ = {0, l/n 1 " £ } |J ^Oj : 9j > l/n 1— e V and e is chosen above with 
5 > (ln(20/p 2 ))/(lnlnn), a bound of Si = o(l) can be obtained. This means that for n — > oo each 
letter of 9 is in a single bin by itself. A similar approach yields £4 = o(l). This approach, however, 
results in a larger first term in an overall usually looser lower bound in (I5ip . 

Combining (|104|) - l|107|l . (jllip . and (f22j) gives a lower bound on Hg Choosing e = and 

e = (1 + 5) (log Inn)/ (log n) with 5 > (ln20)/(lnlnn) yields the lower bound of (|51|) . 

To numerically compute a lower bound for a finite n with parameters e and £o> Jo an d ji are 
computed by ([98]) . Then, (j99j) - (|100p are used to compute ipo and </?oi- Step (6) of fj 105|) and the first 
equality of (1107P are used to compute V\ and V3, respectively. Instead of using (|106p . the summand 
of ([8]) is included in the summand of V2 in (I104p . and V 2 is precisely computed. This is necessary 
for tighter bounds for very small n as shown in Table [U Bin count b g;inax used in (lllip to bound 
5i must be taken as the minimum between its value in (jllOp and min (^B^, pn l ~ £ / [\ — p) — 2^ . 
Asymptotically, the bounds of (fl"4"|) and (fT5j) are looser than that of (fl"6j) because they produce 
bounds of 0((logn)(loglogn)) and O(logn) on Si, respectively. However, for practical n, using 
these bounds may sometimes produce tighter bounds. The tightest bound for Si among those 
resulting from (jl4l) - (ll6l ) can be used for each p, e, and n. The sum in (|15p is bounded similarly 
to the sum in (lllll) . where the ratio (6 + 2)/ (Jo — 1) in (lllip is replaced by (b + l)/b to bound 
K b ; b= 1, 2, ... ,b gtmax = min j^, ^ pn x ~ e j (1 - p) - 1, 1/((1 -p)~ - 5 - Last, S4 is bounded 
with (|22p . numerically computing (|112|) - f)114j) . For given p and n, e and £0 are numerically optimized 
to give the tightest bound, resulting in the non-asymptotic curves in Figure 0] and the values in 
Table [TJ While asymptotically negligible, Si dominates the bound for small p and large n. Using 
precise expressions instead of bounds on V% yields better bounds with larger eq. Parameter e 
decreases with n, roughly following the curve of 1.5(lnlnn)/(lnn). 

To derive a tight upper bound, (|25p is used, where rj is built with e < (ei > 0). Nonnegative Ei 
is necessary to obtain negligible R' , yet reducing the rate of R[. A simpler bound can be obtained 
by using U > 0. The remaining terms of (|25p are bounded below. First, 
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< nH e (X) 5— hn^olog- 



p^n £1 ' (^0 

( c ) . , (l-p)h 2 (p) 1 en e °- £l 

' —3-^ + log 

p z n £1 pn e ° 1 — p 



^ - il + — lQ g -r— ■ (115) 



where (a) follows from the same reasons as (a)-(b) in (|105p . (6) follows from (jlOip and Taylor 
expansion on the last term, and (c) follows again from (jlOip . 
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From ([291) and CESD, 
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where (a) again follows from (jlOip . In a similar manner, 
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*■ 5 2(2^)'" '^-^■ 1 ° g ^(^g) 
(«) py&n 2 2e(2-p)fci 



^ *W01 (l " (1 _ p) 2 n 2 (eo - ei ) 



(6) 1 

< 



2(2 - p)pn 2ei 



2e(2-p)(e -e 1 )(l- , log(1 r, p) ) 
ei log n + log log n + log 



+° ( ) ( 117 ) 



(1 - p) log ^ 

1 

where (a) follows from (^oi — — ^oi' < Vol) and since v?o/Voi = (1— p)- 7o-J1 < 1/[(1— p)n £o ~ £l ] 
following (l99l)- (fT00l) and (fT0"2~]) . and (6) follows from bounding ip 01 with (fTUTJ) and fci with (fT02l) . 
Note that the logarithmic bound on k\ reduces the rate of R^. This is the reason that two separate 
bins with positive Eq and e% are used. With proper choices of these parameters, R' becomes 
negligible, yet, bin holds most symbols, leaving only a logarithmic number of symbols in bin 1. 

Summing (|115p . (|116p and (|117p . a parametric upper bound on Hg (fy n ) is obtained. Substi- 
tuting a constant to Eq, and letting s\ = (log log log n) /(log n), where £q — e\ < 1, gives the upper 
bound of (|5ip . The dominant terms are the first two of (|115p and those of (|117p . and R' is neg- 
ligible. The upper bound can be tightened by lower bounding U of (f25|) using (p7|) . The limits 
of the sum and its elements can be lower bounded in a similar manner to the derivation for S± in 
(fT08l) - (fTTTjh Since L b > k b (l - e~ n0 j) = 0(l/n Sl ) = O (l/(log log n)) when k b > 2 this does not 
change the rate of the bound. This additional term was used together with the last equality of (I115|) 
and the first inequality of (]116p to produce the non-asymptotic bounds in Figure HI where, again, 
E b were numerically optimized. Instead of using (|117p . the value of R[ was computed precisely with 
(|28p . where L\ was computed with (|8j). This was necessary to achieve tight bounds for small n as 
shown in Table [TJ The "simple" bound in Figure 0] does not include the U term. For very small p, 
this term does generate more significant gain. For example, for n = 10 5 , and p = 0.01, out of at 
least 1561 bits of decrease from nHg(X), 1017 result from the term U (i.e., multiple letters in bins 
b > 1 of rj). However, for greater p the gain from U diminishes, because very few bins b > 1 (if 
any) contain more than a single letter. □ 



6.4 Linear Monotonic Distributions 



Proof of Theorem O Let £2 = £ = © ((log log n)/ log re) < J/2. Let % be the smallest i, such 
that 6i > an d ib be the smallest i, such that 8i > Hence, 



2A 2 



£ 1 

- + 



1,2, 



frfn 1 " 62 1 
2A 2 + 2 



6 = 3,4, 



(118) 
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where b' b is the proper index in rj' corresponding to index b in rj (as defined in ©), £2 = Hi an d 
i = 4 = (Vi 1 -* /^ 2 ) + 0-5] . It follows that 



<A)i 



f 0; if A > Vn 1+£ 

S(^-l) 2 = f£(l + 0(^)); iff <A<^ (119) 
v 1; if A < 0.5n £ . 



In the first region, A > n 2//3+5 , implying k < n 1 / 3 5 . Using the trivial upper bound Hg (^™) < 
nH e (X). From CDS]), 9901 = 0. Hence, S 2 ,S 3 ,S 4 = 0, and Hf l) {X) = H e {X). Only 5i remains 
for using (fT3j) . Since k£ = i{, + 2 — let 

' ^> + 2 ) 2 -(6-l) 2 ) = ^(&+l) (120) 



6 2A 2 ^ ' v y ; A 2 V 26, 

be the unrounded value computed to obtain n' b . We must have > 1 so that a summand in the 
dominant sum of (|16p is not zero. This implies that such summands only exist for b > b' min , where 

b'min > 3^(1 + o(l)) > I • n^ +25 - £ (l + o(l)) (121) 

where (a) follows from A > n 2 / 3+<5 . However, for the maximal probability, 0^ = 2A 2 (/c — 0.5)/n 2 > 
^rnaxl^ s ■ Thus the maximal populated bin has index b max < y / 2X/n £ < A 2 /(3n 1+£ ), where the 
last relation follows from A > ri 2 / 3+<5 and since e « if. Using (|16p . this implies that Si = o(l). 
Combining all terms of (USD, H e (W n ) > nH e (X) - o(l). 

For the second region, S3 > 0, and lower bounding (fT9l) . 

ii— 1 

^ 01) (X) + S 2 > nHe(X) - ^ log ^ = nff fl (I) - O (ii logn) . (122) 

i=i ' 

Similarly to (|120j) . for large b, 

n 1+£ b 

K b = i b+1 -i b = (1+ (1))^-. (123) 

Defining R b similarly to k' b but w.r.t. K b , and requiring R b < 2 for terms in the sum of Si leads to 
bmin < 2A 2 /n 1+e , where 6 m j n is defined as b' min but w.r.t. K b . Using (fT5j) . the sum of £1 is 

bmax ^ bmax 

^log(K 6 !) = ^ K b\og— + - ^ log K b + O (b max ) 

b — 1 b — frmin b — ^min 

/,\ I l+£ l + £ bmax 

U (i + (i)). (fc-^Jbg + £ 6 tog 6 



A 2 e A 2 

6= 



'mm 



= (1 + o(l)) ■ I (fc - i bmin ) log ^ + ^ ( % log % - % log 6 ' 



A 2 e A 2 \ 2 y/e 2 b 



W , f n, n 1+e n , /2A 



< (l + (l),.|-,„ g _ + -,og V -| 

. . , n, \/2n 1+£ / 2 , . 

( 1 + °^-A lo g^7^ ( 124 ) 
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where (a) follows from Stirling's approximation in (|64|) . (b) from (|123p . (c) from approximating 
the sum by an integral, and (d) since k = n/X, ib min < 2A 2 /n 1+£ (which follows from (jll8l) ). and 
bmax < \Jl\jn e . The terms that result from «b min and the lower limit of the integral are of second 
order. (By definition of the region, A 3 < n 2_3<5 <C n 2+e /2, which implies that ib min < 2\ 2 /n 1+£ <C 
ra/A = fc. The upper limit on A also results in b m i n <C b max .) 

By definition in Theorem [T] and from (jll8p , 

a 1 _ 0(4l )_ (^!)ffl„(2) (125) 

where (a) follows from the choice of e and since n £ <C n <5 /2 < A. Following (|125p . the last term in 
(|122p is O {i\ logii) = o{S\). Hence, combining all terms of (fT3j) 

He (*") > nH e (X) - (1 + o(l)) • - log (126) 



where the additional 3 in the argument of the logarithm follows from the second term of (]15p . With 
a choice of e = ((log log n) /(log n)), this leads to the lower bound of (i57|) in this region. 

For an upper bound in the second region, Hf l) {X) < H e {X). Using (J2SD, 

i4 = O (n m log n) = O log n\ ® g) (127) 

where (a) follows from (|119p and (6) from ra 2e <C A in this region. A lower bound on U is obtained 
following the same steps as (|124p . where —£2 replaces e. Plugging £2 = 6 ( (log log n)/ (log n)), using 
(f26l) . yields the upper bound of (|57l) . 

For the third region, let £0 = ~~ log(2A)/(logn). This leads to rji = 2\/n. Hence, since 
k < 2A/n, Hf 1] = S x = S 4 = U = 0. With looser bounding, also S 3 > 0. From (gQJ 



5. 



2 



> (i + »(D)-(i-y)4E^^ 



i=l 



2AA 2A 4 ^,. _, 2l n 2 



3/ ji 2 ^ 2(i-0.5)A 2 



i=l 

(a) ^ / 2A\ 2A 4 *; 3 e V3 n 2 

( ^ (i^(D)-(i-y)-^io g ^ (128) 

where (a) follows from approximating the sum by an integral and since n — ► 00, and (6) from 
substituting fc = n/A. To use ()26|) . approximating a sum by an integral 

E fl? = (1 + o(l)) • ^ - f = (1 + o(l)) • I • (129) 
It then follows using (|29|) that 

2 3p?7 

i^i<(l + o(l))---Anlog^ r • (130) 

Since all other terms but S% for the lower bound in (|13j) and i?Q X for the upper bound in (I26p are 
or bounded by 0, both bounds are proved from (|128p and (|130p for the third region of (|57p . □ 
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7 Summary and Conclusions 



Tight bounds on the entropy of patterns of i.i.d. sequences were used to provide asymptotic and 
non-asymptotic approximations of the pattern block entropies for several distributions. The finite 
block pattern entropy was approximated for blocks of data generated by uniform distributions and 
monotonic distributions. Monotonic distributions studied include slowly decaying distributions over 
the integers, the Zipf distribution, the geometric distribution, and a linearly increasing distribution. 
Specifically, the pattern entropy was bounded for distributions that have infinite i.i.d. entropy rates. 
Conditional next index entropy was studied for distributions over small alphabets. 
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